Deep architectures for protein contact map prediction

نویسندگان

  • Pietro Di Lena
  • Ken Nagata
  • Pierre Baldi
چکیده

MOTIVATION Residue-residue contact prediction is important for protein structure prediction and other applications. However, the accuracy of current contact predictors often barely exceeds 20% on long-range contacts, falling short of the level required for ab initio structure prediction. RESULTS Here, we develop a novel machine learning approach for contact map prediction using three steps of increasing resolution. First, we use 2D recursive neural networks to predict coarse contacts and orientations between secondary structure elements. Second, we use an energy-based method to align secondary structure elements and predict contact probabilities between residues in contacting alpha-helices or strands. Third, we use a deep neural network architecture to organize and progressively refine the prediction of contacts, integrating information over both space and time. We train the architecture on a large set of non-redundant proteins and test it on a large set of non-homologous domains, as well as on the set of protein domains used for contact prediction in the two most recent CASP8 and CASP9 experiments. For long-range contacts, the accuracy of the new CMAPpro predictor is close to 30%, a significant increase over existing approaches. AVAILABILITY CMAPpro is available as part of the SCRATCH suite at http://scratch.proteomics.ics.uci.edu/. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack...

متن کامل

Prediction of Contact Maps by Recurrent Neural Network Architectures and Hidden Context Propagation From All Four Cardinal Corners

ABSTRACT Motivation: Accurate prediction of protein contact maps is an important step in computational structural proteomics. Because contact maps provide a translation and rotation invariant topological representation of a protein, they can be used as a fundamental intermediary step in protein structure prediction. Results: We develop a new set of flexible machine learning architectures for th...

متن کامل

Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners

MOTIVATION Accurate prediction of protein contact maps is an important step in computational structural proteomics. Because contact maps provide a translation and rotation invariant topological representation of a protein, they can be used as a fundamental intermediary step in protein structure prediction. RESULTS We develop a new set of flexible machine learning architectures for the predict...

متن کامل

New Machine Learning Methods for the Prediction of Protein Topologies

Protein structures are translation and rotation invariant. In protein structure prediction, it is therefore important to be able to assess and predict intermediary topological representations, such as distance or contact maps, that are translation and rotation invariant. Here we develop several new machine learning methods for the prediction and assessment of fine-grained and coarse topological...

متن کامل

Striped sheets and protein contact prediction

MOTIVATION Current approaches to contact map prediction in proteins have focused on amino acid conservation and patterns of mutation at sequentially distant positions. This sequence information is poorly understood and very little progress has been made in this area during recent years. RESULTS In this study, an observation of 'striped' sequence patterns across beta-sheets prompted the develo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 19  شماره 

صفحات  -

تاریخ انتشار 2012